Japanese-Chinese Cross-Language Information Retrieval: An Interlingua Apporach

نویسندگان

  • Md Maruf Hasan
  • Yuji Matsumoto
چکیده

Electronically available multilingual information can be divided into two major categories: (1) alphabetic language information (English-like alphabetic languages) and (2) ideographic language information (Chinese-like ideographic languages). The information available in non-English alphabetic languages as well as in ideographic languages (especially, in Japanese and Chinese) is growing at an incredibly high rate in recent years. Due to the ideographic nature of Japanese and Chinese, complicated with the existence of several encoding standards in use, efficient processing (representation, indexing, retrieval, etc.) of such information became a tedious task. In this paper, we propose a Han Character (Kanji) oriented Interlingua model of indexing and retrieving Japanese and Chinese information. We report the results of monoand crosslanguage information retrieval on a Kanji space where documents and queries are represented in terms of Kanji oriented vectors. We also employ a dimensionality reduction technique to compute a Kanji Conceptual Space (KCS) from the initial Kanji space, which can facilitate conceptual retrieval of both monoand crosslanguage information for these languages. Similar indexing approaches for multiple European languages through term association (e.g., latent semantic indexing) or through conceptual mapping (using lexical ontology such as, WordNet) are being intensively explored. The Interlingua approach investigated here with Japanese and Chinese languages, and the term (or concept) association model investigated with the European languages are similar; and these approaches can be easily integrated. Therefore, the proposed Interlingua model can pave the way for handling multilingual information access and retrieval efficiently and uniformly.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Is there Hope for Interlingua methods? A CLIR Comparison Experiment between Interlingua and Query Translation

A comparison of interlingua and query translation is proposed in a particular cross-language information retrieval (CLIR) application which consists on retrieving a book from the collection by using one of its chapters in a different language as a query. The experiments are performed in three languages (English, Chinese and Spanish) and all the possible combinations. It is shown that interlingu...

متن کامل

CINDOR TREC-9 English-Chinese Evaluation

MNIS-TextWise Labs participated in the TREC-9 Chinese Cross-Language Information Retrieval track. The focus of our research for this participation has been on rapidly adding Chinese capabilities to CINDOR using tools for automatically generating a Chinese Conceptual Interlingua from existing lexical resources. For the TREC-9 evaluation we also built a version of our system which loosely integra...

متن کامل

Chinese-Japanese Cross Language Information Retrieval: A Han Character Based Approach

In this paper, we investigate cross language information retrieval (CLIR) for Chinese and Japanese texts utilizing the Han characters common ideographs used in writing Chinese, Japanese and Korean (CJK) languages. The Unicode encoding scheme, which encodes the superset of Han characters, is used as a common encoding platform to deal with the mulfilingual collection in a uniform manner. We discu...

متن کامل

How Similar are Chinese and Japanese for Cross-Language Information Retrieval?

For NTCIR Workshop 5 UC Berkeley participated in the bilingual task of the CLIR track. Our focus was on Chinese topic searches against the Japanese News document collection, and on Japanese topic search against the Chinese News Document Collection. Extending our work of NTCIR 4 workshop, we performed search experiments to segment and use Chinese search topics directly as if they were Japanese t...

متن کامل

An Experimental Assessment of Direct Versus. Interlingual Translation for Cross-Language Information Retrieval

We introduce an interlingua-based approach to crosslanguage information retrieval, in which queries, as well as documents, are mapped onto a language-independent concept layer and retrieval operations are performed at the level of that interlingua. This approach is contrasted with one which operates without such an intermediary concept level. Non-English queries (German ones, in our experiments...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • IJCLCLP

دوره 5  شماره 

صفحات  -

تاریخ انتشار 2000